Skip to content

TYP: Made ffill_indexer arg-type ndarray #44893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 28, 2021

Conversation

nickleus27
Copy link
Contributor

xref #37715

right_join_indexer = libjoin.ffill_indexer(
right_indexer # type: ignore[arg-type]
)
left_join_indexer = libjoin.ffill_indexer(np.asarray(left_indexer))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid calling asarray. This adds unnecessary overhead and does not guarantee that we get an array back if None is passed into it.

You can use assert if it is guaranteed, that left_indexer is not None here. Adjusting the return type of _get_join_info would be even better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl Hi Patrick. I see what you mean about the overhead of np.asarray(). None seems to come from L957 and L962 in _get_join_info which comes from return value of _left_join_on_index L2071. Not sure what a good alternative np.ndarray alternative to None would be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reason about the code s.t. we know we can never get here with left_indexer being None? if so, then can do a left_indexer = cast(np.ndarray, left_indexer). otherwise

if left_indexer is None:
     raise AppropriateException("Useful message")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel thank you for the suggestion. I was unsure if it was okay to use typing.cast() Do these changes look okay?

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Typing type annotations, mypy/pyright type checking labels Dec 27, 2021
@jreback jreback added this to the 1.4 milestone Dec 28, 2021
@jreback jreback merged commit 2f915b3 into pandas-dev:master Dec 28, 2021
@jreback
Copy link
Contributor

jreback commented Dec 28, 2021

thanks @nickleus27

right_join_indexer = libjoin.ffill_indexer(
right_indexer # type: ignore[arg-type]
if left_indexer is None:
raise TypeError("left_indexer cannot be None")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ever reached? is it because the user passed something invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I was wondering the same thing so I could write a better error message. Tbh I am not sure. I need to spend some more time to see how we get to this point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. this probably shouldn't have been merged until this was sorted out. NBD bc it isn't actively harmful here, but id appreciate it if you'd follow up on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I am having a hard time finding how to get the exception to be thrown. I am working on it, but it seems it will take me a little time.

Copy link
Contributor Author

@nickleus27 nickleus27 Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that this exception situation only happens when the fill_method=='ffill'.
Is the fill_method argument only in pandas.merge_ordered()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fill_method argument only in pandas.merge_ordered()?

looks like it, yah. i dont know this part of the code particularly well so im just checking docstrings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel i do not think we can ever get to the TyperError thrown on L1651 because the left_indexer is only None when L956 has self.right_index as True. But, pandas.merge_ordered() does not have a right_index argument. I might be misunderstanding what is happening but I do not understand how you can set the fill_method='ffill' that i think is only an argument for merge_ordered but it has no right_index argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. In this case, the check should go right after the _get_join_info call since i dont think it depends on anything after it (double-check me on this pls)

If im looking at this correctly, i think we can also rule out the cases where either left_join_indexer or right_join_indexer is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible for left_indexer and right_indexer to be None, but not when fill_method == "ffill".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants